Reflective DLL Injection

Table of content

Overview
Blueprint
Store the DLL content in memory
Inject the DLL
Perform image base relocations
- Symbol Table
- Perform relocation
Process imported functions
TLS Callback et DLL Main
- TLS Callback
- DLL Main
GetProcAddress

Overview

The Reflective DLL Injection is a process injection technique that allows an attacker to inject DLL stored in memory rather than from the disk.

Indeed, any DLL stored on disk can be easily loaded using the LoadLibrary Windows API. However, this API does not work when the DLL content is stored in memory.

Moreover, the LoadLibrary API raises some kernel events as it is shown in the following figure:

Using a Reflective DLL Injection is a way to reimplement a custom LoadLibrary function that will not raise any kernel events hence limiting detection from the Blue Team.

Blueprint

The Reflective DLL Injection can be done through the following steps:

Store the DLL content in memory
Parse the DLL header to retrieve the SizeOfImage value
Allocate a memory space whose size is equal to the DLL SizeOfImage
Copy each header and section in the allocated memory space
Perform image base relocations if needed
Load dependencies DLL ie DLL used by the loaded DLL
Resolve imported functions and populate the Import Address Table (IAT)
Protect memory sections according to DLL's sections needs
Run the DLL TLS callbacks
Run the DLL main function
Enjoy !

Store the DLL content in memory

This step is quite simple. The goal is to get a buffer with the whole DLL file's content in it. The following code can be used to load a full file in memory:

BOOL FileExistsW(LPCWSTR szPath){
    DWORD dwAttrib = GetFileAttributesW(szPath);
    return (dwAttrib != INVALID_FILE_ATTRIBUTES && !(dwAttrib & FILE_ATTRIBUTE_DIRECTORY));
}

PBYTE ReadFileW(LPCWSTR filename, PDWORD fileSize) {

    // Open the file with read permissions
    HANDLE hFile = CreateFileW(filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile == INVALID_HANDLE_VALUE) {
        return NULL;
    }

    // Retrieve the file size
    *fileSize = GetFileSize(hFile, NULL);
    DWORD sizeRead = 0;

    // Allocate size in heap to contains the file content
    PBYTE content = (PBYTE)malloc(*fileSize);

    // Populate the allocated buffer with the file content
    DWORD result = ReadFile(hFile, content, *fileSize, &sizeRead, NULL);
    if (!result || sizeRead != *fileSize) {
        DEBUG("[x] Error during %ls file read\n", filename);
        free(content);
        content = NULL;
    }
    CloseHandle(hFile);

    // Return the buffer
    return content;
}

So now the whole DLL content is stored in memory and is ready to be injected.

Inject the DLL

To inject whatever in memory you need to know the global size of the thing you want to inject. For PE files, such as DLL, the full file size can be found in the nt headers.

The post goal is not to create a PE parser so each fields of the PE will not be explicitly documented in this post.

The full DLL size can be found with the following code that retrieves the OptionalHeader.SizeOfImage:

// dllContent is the first byte of the DLL stored in memory
DWORD getImageSize(PVOID dllContent){
    IMAGE_NT_HEADERS* ntHeaders = (IMAGE_NT_HEADERS*)((PBYTE)dllContent + pe->dosHeader->e_lfanew);
    return ntHeaders->OptionalHeader.SizeOfImage
}

Then, once the full size is known, a memory page is allocated using VirtualAlloc:

PVOID startAddress = VirtualAlloc(
    NULL, 
    dllParsed->ntHeader->OptionalHeader.SizeOfImage, 
    MEM_COMMIT | MEM_RESERVE, 
    PAGE_READWRITE
);

So now, the allocated page must be populated with the DLL's headers and sections:

IMAGE_NT_HEADERS* ntHeader = (IMAGE_NT_HEADERS*)((PBYTE)dllContent + pe->dosHeader->e_lfanew);

// Copy the headers
CopyMemory(startAddress, dllContent, ntHeader->OptionalHeader.SizeOfHeaders);

// Copy the sections
PIMAGE_SECTION_HEADER sectionHeader = IMAGE_FIRST_SECTION(dllParsed->ntHeader);
for (DWORD i = 0; i < ntHeader->FileHeader.NumberOfSections; i++, sectionHeader++) {
    CopyMemory(
        (PBYTE)startAddress + sectionHeader->VirtualAddress, 
        (PBYTE)dllContent + sectionHeader->PointerToRawData, 
        sectionHeader->SizeOfRawData
    );
}

At the end of these steps, the whole DLL is loaded in memory at the address startAddress.

In a better world, this could be the end and the DLL entry point could be run straight away. However, there is a high probability that your DLL has not been loaded to its preferred base address.

Thus, several relocations must be done before being able to run you DLL.

Perform image base relocations

When a PE is compiled, the symbols are referenced as if the PE is loaded at a given base address. This address is the PE preferred loading address or the ImageBase address.

If the PE is not loaded to its preferred ImageBase address, the load address shift breaks all absolute references among the PE.

For example, if the PE has been compiled to reference the data A from the address 0x40000, this reference only works if the PE is loaded to the same base address as the one used during compilation. If the PE is loaded to another base address, the reference 0x40000 will not point to the A data anymore but to an arbitrary data in the process memory space. The A data will be stored at 0x4000 + offset where the offset is the difference between the current loading address and the preferred ImageBase address.

The .reloc section contains all the hardcoded absolute references that need fixing if the PE is not loaded at its preferred base address.

In a nutshell, symbols referenced by their absolute address are not hardcoded into the .text section. Their references in the .text section point to the .reloc section. The .reloc section is used as a lookup table between the symbol reference in .text and its absolute definition address.

Symbol Table

The .reloc section contains the Relocation Table. This table is divided into blocks that represents the base relocation.

The start address for the Relocation Table can be retrieved in the PE DataDirectory located in the NtHeader's OptionalHeader parameter:

PVOID firstRelocationBlock = ntHeader->OptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress

Each block starts with an IMAGE_BASE_RELOCATION and is followed by any number of offsets and field entries.

The IMAGE_BASE_RELOCATION part can be parsed using the following structure:

typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
} IMAGE_BASE_RELOCATION;

The different relocation entries can be iterated from the VirtualAddress address to the VirtualAddress + SizeOfBlock address. So, once the IMAGE_BASE_RELOCATION object is extracted from the relocation table block, it can be used to access to the relocation entries as the relocation table structure can be seen as follows:

The different IMAGE_BASE_RELOCATION can be iterated using the following code:

//pseudo code
IMAGE_BASE_RELOCATION* currentRelocation = firstRelocationBlock
while(currentRelocation->VirtualAddress){
    // Process the current relocation block
    ...
    // Jump to the next relocation block
    currentRelocation = (PBYTE)currentRelocation + currentRelocation->SizeOfBlock;
}

So now, it is needed to access to each relocation block's entry. These entries can be parsed using the following structure:

typedef struct _IMAGE_RELOCATION_ENTRY {
    WORD Offset : 12;
    WORD Type : 4;
} IMAGE_RELOCATION_ENTRY;

Each entry contained in the relocation block can be parsed using the following code:

// pseudo code
// IMAGE_BASE_RELOCATION *currentRelocation : the current relocation block
IMAGE_RELOCATION_ENTRY* relocationEntry = &currentRelocation[1];
while ((DWORD64)relocationEntry < (DWORD64)currentRelocation + currentRelocation->SizeOfBlock){
    // process the relocation
    ...
    // jump to the next entry
    relocationEntry++;
}

So, to summarize, the following code can be used to parse a full relocation table:

//pseudo code
IMAGE_BASE_RELOCATION* currentRelocation = ntHeader->OptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress
while(currentRelocation->VirtualAddress){
    // process the current relocation block
    IMAGE_RELOCATION_ENTRY* relocationEntry = &currentRelocation[1];
    while ((DWORD64)relocationEntry < (DWORD64)currentRelocation + currentRelocation->SizeOfBlock){
        // process the relocation
        ...
        // jump to the next entry
        relocationEntry++;
    }
    // jump to the next relocation block
    currentRelocation = (PBYTE)currentRelocation + currentRelocation->SizeOfBlock;
}

Perform relocation

Performing a relocation is modifying the Relocation Table entries and add an offset equal to the difference of the PE preferred load address and the real PE load address.

So, the relocation address must be found, and its content must be updated. In order to update the relocation content, the relocation type must be taken into account. Indeed, depending on the relocation type, the modification can be handled differently:

Name	Value	Description
IMAGE_REL_BASED_ABSOLUTE	0x00	The relocation is skipped. It is often used to pad a block
IMAGE_REL_BASED_HIGH	0x01	The 16 high bits of the offset is added to the current relocation value
IMAGE_REL_BASED_LOW	0x03	The 16 low bits of the offset is added to the current relocation value
IMAGE_REL_BASED_HIGHLOW	0x04	The 32 bits of the offset is added to the current relocation value
IMAGE_REL_BASED_DIR64	0x10	The 64 bits of the offset is added to the current relocation value

The following code can be used to handle the relocations:

// pseudo code 
// PVOID startAddress : the actual PE load address
// IMAGE_BASE_RELOCATION* currentRelocation : the actual relocation block

// Compute the offset
DWORD64 offset = startAddress - ntHeader->OptionalHeader.ImageBase;

// Get the first relocation entry in the block
IMAGE_RELOCATION_ENTRY* relocationEntry = &currentRelocation[1];

// Parse all relocation entry in the relocation block
while(relocationEntry < currentRelocation + currentRelocation->SizeOfBlock){
    // Get the relocation address
    DWORD64 relocationRVA = currentRelocation->VirtualAddress + relocationEntry->Offset;
    DWORD64 *relocationAddress = startAddress + relocationRVA;

    // Process the relocation
    switch(relocationEntry->Type){
        case IMAGE_REL_BASED_HIGH:
            // 16 high bits
            *relocationAddress += HIWORD(offset);
            break;
        case IMAGE_REL_BASED_LOW:
            // 16 low bits
            *relocationAddress += LOWORD(offset);
            break;
        case IMAGE_REL_BASED_HIGHLOW:
            // 32 bits
            *relocationAddress += (DWORD)offset;
            break;
        case IMAGE_REL_BASED_DIR64:
            // 64 bits
            *relocationAddress += offset;
            break;
        default:
            break;
    }
    relocationEntry ++;
}

Thus, at this moment, all relocations had been performed and the program should work whatever its load address.

However, what happens if the DLL uses functions defined in another DLL ? In this case, the DLL must be imported and the functions address must be resolved.

Process imported functions

DLL have dependencies. Indeed, the DLL could use functions defined in other DLL. For example, the KERNEL32.DLL has the NTDLL.DLL as dependencies : VirtualAlloc will call NtAllocateVirtualMemory.

Thus, these references must also be resolved.

Import Directory Table

The Import Directory Table is located in the .idata section and contains one entry for every DLL used by the PE.

The Import Directory Table entry can be parsed using the following structure:

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR;

Once the DLL name is known, it must be loaded using a recursive call or using LoadLibrary. However, the use of LoadLibrary is kind of problematic as it will raise several kernel events.

Then, the ILT must be parsed to fill the IAT.

IAT and ILT

The IAT is a table that will contain the address of the resolved symbols ie the function addresses resolved through GetProcAddress. This table is usually empty when the process start and is filled depending on the ILT information.

The ILT is a lookup table containing information about the imported function such as its name, its ordinal etc... The ILT information is used to resolve the imported function address that will be stored in the IAT.

The ILT and IAT entries can be parsed using the following structure:

typedef struct _IMAGE_THUNK_DATA {
    union {
        ULONGLONG ForwarderString;
        ULONGLONG Function;       
        ULONGLONG Ordinal;
        ULONGLONG AddressOfData;  
    } u1;
} IMAGE_THUNK_DATA;

For the ILT the interesting parameters are :

The AddressOfData parameter that contains the function name RVA. This address points to the IMAGE_IMPORT_BY_NAME structure that can be used to retrieve the function name as a simple string. This string can then be used with GetProcAddress to retrieve the function address.
The Ordinal parameter that contains the function ordinal and can be used to resolve the function address through GetProcAddress

For the IAT the interesting parameter is :

The Function parameter that will receive the function resolved address

Thus, the idea is to parse the while ILT, use the AddressOfData or Ordinal parameter to resolve the function address and write it in the IAT's Function value.

Fill the IAT

The following code can be used to process the imported functions:

// pseudo code

// Get the first Import Directory Table entry
IMAGE_IMPORT_DESCRIPTOR* importDescriptor = startAddress + ntHeaders->OptionalHeader->dataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;

// Iterate through all Import Directory Table entries
for (SIZE_T i = 0; importDescriptor->Name; importDescriptor++) {
    // Get the IAT and ILT first entry
    PIMAGE_THUNK_DATA iat = startAddress + importDescriptor->FirstThunk;
    PIMAGE_THUNK_DATA ilt = startAddress + importDescriptor->OriginalFirstThunk;

    // Get the associated DLL name
    char* dllName = startAddress + importDescriptor->Name;

    // Load the DLL
    // For clean read, the LoadLibrary function is used
    HMODULE dllHandle = GetModuleHandleA(dllName);
    if (!dllHandle) {
        dllHandle = LoadLibraryExA(dllName, NULL, NULL);
        if (!dllHandle) {
            return FALSE;
        }
    }

    // Iterate through the ILT entries
    for (; ilt->u1.Function; iat++, ilt++) {
        // Check if the function is given as an ordinal
        if (IMAGE_SNAP_BY_ORDINAL(ilt->u1.Ordinal)) {
            // Resolve function name through its ordinal
            LPCSTR functionOrdinal = (LPCSTR)IMAGE_ORDINAL(ilt->u1.Ordinal);
            // Write function address into the IAT's corresponding entry
            iat->u1.Function = (DWORD_PTR)GetProcAddress(dllHandle, functionOrdinal);
        }
        else {
            // Load the HINT structure from the ILT information 
            // The HINT structure contains address to the function name
            IMAGE_IMPORT_BY_NAME* hint = startAddress + ilt->u1.AddressOfData;
            // Write function address into the IAT's correspond entry
            iat->u1.Function = GetProcAddress(dllHandle, hint->Name);
        }
    }
}

At this moment, the DLL dependencies are loaded.

Delayed Import Table

The Delayed Import Table works exactly like the standard Import Directory Table. This table has been added to support an uniform mechanism for applications to delay the DLL until one of its exported functions is used.

To avoid any problem, this table will be processed exactly as the Import Directory Table. The only difference is that this table entries will be parsed using the following structure:

typedef struct _IMAGE_DELAYLOAD_DESCRIPTOR {
    union {
        DWORD AllAttributes;
        struct {
            DWORD RvaBased : 1;            
            DWORD ReservedAttributes : 31;
        } DUMMYSTRUCTNAME;
    } Attributes;

    DWORD DllNameRVA;                      
    DWORD ModuleHandleRVA;                 
    DWORD ImportAddressTableRVA;           
    DWORD ImportNameTableRVA;              
    DWORD BoundImportAddressTableRVA;      
    DWORD UnloadInformationTableRVA;       
    DWORD TimeDateStamp;                   


} IMAGE_DELAYLOAD_DESCRIPTOR

The DLLNameRVA value contains the RVA to the DLL name.

The ImportAddressTableRVA contains the IAT's RVA. Likewise, the ImportNameTableRVA contains the ILT's RVA.

Besides, nothing changes and the previous code used to parse the Import Directory Table can be used as is.

TLS Callback et DLL Main

TLS Callback

The TLS Callback are function ran before the entry point. They are often used by malware as anti-debug techniques but also by legit DLL to set up the environment.

Before running the DLL main, these callbacks must be run. They can be found in the PE DataDirectory in the Entry TLS parameter. The following code can be used to run these functions:

// pseudo code

// Check if any TLS callback is defined
if (dllParsed->dataDirectory[IMAGE_DIRECTORY_ENTRY_TLS].Size) {

    // Get the TLS information
    PIMAGE_TLS_DIRECTORY tlsDir = startAddress + dllParsed->dataDirectory[IMAGE_DIRECTORY_ENTRY_TLS].VirtualAddress

    // Get the TLS function address
    PIMAGE_TLS_CALLBACK* callback = (PIMAGE_TLS_CALLBACK*)(tlsDir->AddressOfCallBacks);
    for (; *callback; callback++) {
        // Call the function
        (*callback)((PVOID)dllParsed->baseAddress, DLL_PROCESS_ATTACH, NULL);
    }
}

DLL Main

The DLL main function is the function called when the DLL is loaded by the system. The following code is an example of DLL Main:

BOOL WINAPI DllMain(
    HINSTANCE hinstDLL,  // handle to DLL module
    DWORD fdwReason,     // reason for calling function
    LPVOID lpvReserved )  // reserved
{
    // Perform actions based on the reason for calling.
    switch( fdwReason ) 
    { 
        case DLL_PROCESS_ATTACH:
         // Initialize once for each new process.
         // Return FALSE to fail DLL load.
            break;

        case DLL_THREAD_ATTACH:
         // Do thread-specific initialization.
            break;

        case DLL_THREAD_DETACH:
         // Do thread-specific cleanup.
            break;

        case DLL_PROCESS_DETACH:

            if (lpvReserved != nullptr)
            {
                break; // do not do cleanup if process termination scenario
            }

         // Perform any necessary cleanup.
            break;
    }
    return TRUE;  // Successful DLL_PROCESS_ATTACH.
}

When the DLL is loaded, the event DLL_PROCESS_ATTACH is used. The following code can be used to call the DLL main:

// pseudo code

typedef BOOL(WINAPI* LPDLLMAIN)(DWORD_PTR image_base, DWORD reason, LPVOID reserved);

LPDLLMAIN entryPoint = (LPDLLMAIN)startAddress + dllParsed->ntHeader->OptionalHeader.AddressOfEntryPoint);
if(entrypoint) {
    BOOL status = entryPoint((HINSTANCE)dllParsed->baseAddress, DLL_PROCESS_ATTACH, NULL);
}

From now, the DLL should be fully loaded and the exported function can be used.

GetProcAddress

The Windows GetProcAddress will not work with the loaded DLL and I wasn't able to understand why. Indeed, the GetProcAddress function requires a handle to the DLL (HMODULE). The HMODULE type simply represents the DLL base address. However, even when the loaded DLL base address is given to the GetProcAddress function, it cannot find the exported function.

Thus, a custom GetProcAddress function must be developed.

The PE header contains all the information needed to access to the exported function name and addresses. Indeed, the DataDirectory contains the ExportDirectory address that can be used to access to the AddressOfFunctions, AddressOfNames and AddressOfNameOrdinals tables.

The AddressOfNames is a table containing the names of the exported functions. The AddressOfFunctions is a table containing the address of the exported functions. The AddressOfNameOrdinals is a lookup table used to link the names from the AddressOfNames to the address from the AddressOfFunctions.

The following code can be used as a custom GetProcAddress:

// pseudo code

DWORD exportDirectoryRVA = (DWORD)dllParsed->dataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
IMAGE_EXPORT_DIRECTORY exportDirectory = startAddress + exportDirectoryRVA;
LPDWORD AddressOfFunctions = startAddress + exportDirectory->AddressOfFunctions;
LPDWORD AddressOfNames = startAddress + exportDirectory->AddressOfNames;
LPWORD AddressOfNameOrdinals = startAddress + exportDirectory->AddressOfNameOrdinals;

for (SIZE_T i = 0; i < exportDirectory->NumberOfFunctions; i++) {
    char *name = (char*)((DWORD64)startAddress + AddressOfNames[i]);
    if (strcmp(name, functionName) == 0) {
        DWORD64 functionRVA = AddressOfFunctions[AddressOfNameOrdinals[i]];
        return startAddress + functionRVA;
    }
}

Reflective DLL Injection

Reflective DLL Injection

Table of content

Overview

Blueprint

Store the DLL content in memory

Inject the DLL

Perform image base relocations

Symbol Table

Perform relocation

Process imported functions

Import Directory Table

IAT and ILT

Fill the IAT

Delayed Import Table

TLS Callback et DLL Main

TLS Callback

DLL Main

GetProcAddress

results matching ""

No results matching ""

results matching ""

No results matching ""